Exploiting long distance collocational relations in predictive typing

نویسندگان

  • Johannes Matiasek
  • Marco Baroni
چکیده

In this paper, we report about some preliminary experiments in which we tried to improve the performance of a state-of-the-art Predictive Typing system for the German language by adding a collocation-based prediction component. This component tries to exploit the fact that texts have a topic and are semantically coherent. Thus, the appearance in a text of a certain word can be a cue that other, semantically related words are likely to appear soon. The collocation-based module exploits this kind of topical/semantic relatedness by relying on statistics about the co-occurrence of words within a large window of text in the training corpus. Our current experimental results indicate that using the collocationbased prediction module has a small but consistent positive effect on the performance of the system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning

WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important type of relation which could potentially add enormous value to WordNet is the inclusion of collocational information, which is paramount in tasks such as Machin...

متن کامل

TANGO: Bilingual Collocational Concordancer

In this paper, we describe TANGO as a collocational concordancer for looking up collocations. The system was designed to answer user’s query of bilingual collocational usage for nouns, verbs and adjectives. We first obtained collocations from the large monolingual British National Corpus (BNC). Subsequently, we identified collocation instances and translation counterparts in the bilingual corpu...

متن کامل

Improving Lexical Databases with Collocational Information: Data from Portuguese

This article focuses on ongoing work done for Portuguese concerning the phenomenon of lexical co-occurrence known as collocation (cf. Cruse, 1986, inter al.). Instances of the syntactic variety formed by noun plus adjective have been especially observed. Collocational instances are not lexical entries, and thus should not be stored in the lexicon as multiword lexical units. Their processing can...

متن کامل

Co-Occurrrence Patterns among Collocations: A Tool for Corpus-Based Lexical Knowledge Acquisition

One of the main problems for applied natural language processing is gaps in the lexicon, including missing words and word senses, and inadequate descriptions of word use in context. Traditional lexicography has similar concerns. The availability of large, on-line text corpora provides a straightforward tool for enlarging the stock of words included in a lexicon. The identification of additional...

متن کامل

The Economies of Scale in Iran Manufacturing Establishments

One of the topics after two decades of applying import substitution policy in Iran manufacturing sector is the importance of industrial export expansion and foreign relations. The main impetus to this policy transfer is the market expansion and potential gains of exploiting the economies of scale and technical upgrades. Based on this argument this research estimates the efficient scale and gain...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003